Active appearance models for photorealistic visual speech synthesis
نویسندگان
چکیده
The perceived quality of a synthetic visual speech signal greatly depends on the smoothness of the presented visual articulators. This paper explains how concatenative visual speech synthesis systems can apply active appearance models to achieve a smooth and natural visual output speech. By modeling the visual speech contained in the system’s speech database, a diversification between the synthesis of the shape and the texture of the talking head is feasible. This allows the system to accurately balance between the articulation strength of the visual articulators and the signal smoothness of the visual mode in order to optimize the synthesis. To improve the synthesis quality, an automatic database normalization strategy has been designed that removes variations from the database which are not related to speech production. As was verified by a perception experiment, this normalization strategy significantly improves the perceived signal quality.
منابع مشابه
Optimized photorealistic audiovisual speech synthesis using active appearance modeling
Active appearance models can represent image information in terms of shape and texture parameters. This paper explains why this makes them highly suitable for data-based 2D audiovisual text-to-speech synthesis. We elaborate on how the differentiation between shape and texture information can be fully exploited to create appropriate unit-selection costs and to enhance the video concatenations. T...
متن کاملMary101: A Photorealistic Text-to-Audio-Visual Speech Syn- thesizer
Previous Work: Much of the previous work in text-to-audio-visual (TTAVS) speech synthesis [9] [2] has focused on integrating physically-based facial models with a particular speech synthesis system in order to give the impression of a ”talking face”. Some TTAVS systems have also resorted to Cyberware scanning techniques to overlay realisticlooking skin texture on top of the underlying graphics ...
متن کاملChromatic Adaptation Post-Filtering in Image Synthesis Reproduction of Ancient Building for Restoration Support
Within the field of cultural heritage restoration, experts are interested in the analysis of data describing the condition and history of ancient monuments. Data are usually distributed over many sites. VRML and Java technology, which are well-suited for describing geometrical models and data interaction over the Internet. Unfortunately, the poor quality of VRML real time rendering is a bottlen...
متن کاملVisual speech synthesis using statistical models of shape and appearance
In this paper we present preliminary results of work towards a video-realistic visual speech synthesizer based on statistical models of shape and appearance. A sequence of images corresponding to an utterance is formed by concatenation of synthesis units (in this case triphones) from a pre-recorded inventory. Initial work has concentrated on a compact representation of human faces, accommodatin...
متن کاملPhoto-realistic visual speech synthesis based on AAM features and an articulatory DBN model with constrained asynchrony
This paper presents a photo realistic visual speech synthesis method based on an audio visual articulatory dynamic Bayesian network model (AF_AVDBN) in which the maximum asynchronies between the articulatory features, such as lips, tongue and glottis/velum, can be controlled. Perceptual linear prediction (PLP) features from the audio speech and active appearance model (AAM) features from mouth ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010